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Stochastic local search algorithms are frequently used to numerically solve hard combinatorial 
optimization or decision problems. We give numerical and approximate analytical descriptions of 
the dynamics of such algorithms applied to random satisfiability problems. We find two different 
dynamical regimes, depending on the number of constraints per variable: For low constraintness, 
the problems are solved efficiently, i.e. in linear time. For higher constraintness, the solution times 
become exponential. We observe that the dynamical behavior is characterized by a fast equilibration 
and fluctuations around this equilibrium. If the algorithm runs long enough, an exponentially rare 
fluctuation towards a solution appears. 
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I. INTRODUCTION 

The last years have seen a fruitful exchange between theoretical computer science and statistical mechanics 0, 
01 ■ Due to the formal analogy between various combinatorial optimization problems and certain spin-glass models, 
substantial progress in the understanding of hard combinatorial questions could be made by using tools which were 
originally developed in the statistical mechanics of disordered systems. 

The most striking results so far were obtained in the description of the solution-space structure of the random 
satisfiability problem 0, 0, IS IE j of the number partitioning problem 's, 9] , of vertex covers 0, 0, ^2'! or colorings 
^3] of random graphs. In these cases, equilibrium methods from statistical mechanics can be applied directly, including 
e.g. the replica and cavity approaches. The main result is that these models undergo phase transitions from an easily 
solvable, under-constrained phase to a hard, highly constrained one. The latter is characterized by the existence of 
glass-like states, i.e. the solution space is subdivided into a large number of disconnected clusters, and there are 
exponentially many excited states hindering even the best local algorithms from finding optimal solutions in sub- 
exponential time (where exponential means, here and in the following, exponential in the system size, as given e.g. 
by the number of discrete degrees of freedom or, in a more computer-science oriented language, in the number of bits 
needed to encode an instance of the problem under consideration). 

Up to now, much less is understood about the dynamical behavior of algorithms which are used to numerically 
solve the combinatorial problems. Also these are known to undergo algorithm-dependent phase transitions from phase 
space regions where the problems are typically efficiently solvable, to regions where solutions are exponentially hard 
to construct. Some understanding was obtained for heuristics, i.e. approximate algorithms running in linear time, 
see e.g. 0, 0, 0|, for complete solvers 0, 0, 0| which are guaranteed to find an optimal solution, and finally 
for randomized versions of these complete algorithms |^ . The problem in analyzing algorithms is that they are 
intimately related to non-equilibrium statistical mechanics, which frequently is technically much harder to handle. 
In addition, algorithms are not forced to fulfill physical criteria like detailed balance, which again complicates the 
analysis. 

In this paper, we are going to analyze a different class of algorithms: stochastic local search algorithms, in particular 
variants of the so-called waZA;-S'^T algorithm which is one of the most popular and successful solvers for satisfiability 
problems. Whereas the full problem is to hard to attack successfully by means of analytical tools, we will give some 
approximation methods which allow us to draw a qualitative picture on how these algorithms solve an optimization 
problem. 

The paper is organized as follows: In Sec. ^Iths considered models are introduced. We first introduce the random 
iC-satisfiability problem (iC-SAT) and give an overview of the current state of knowledge. Then we introduce a second 
model, the random if-XOR-satisfiability problem (i^T-XOR-SAT). Being in many aspects similar to the ivT-SAT, it 
has recently attracted some interest due to its better analytical tractability. In the last part of Sec. ^ we give a 
short introduction to some stochastic local search algorithms, in particular to the famous walk-SAT algorithm which 
will be analyzed in the present paper. We then show some numerical observations in Sec. IIIII These are analytically 
explained in Sees. II VI and Ivl The first of these two sections deals with the linear-time behavior, whereas the second 
one describes the exponential time behavior. Our results are summarized in the last section. 

Note: While preparing this paper, we noticed that a complementary study of the walk-SAT algorithm was carried 
out independently by G. Semerjian and R. Monasson p^ ]. 
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II. THE MODELS 
A. Random Jf-satisfiability 

A random i^T-satisfiability (iiT-SAT) formula F consists of M logical clauses {C^}^i=i m which are defined over 

a set of N Boolean variables {xi — 0, l}i=i,...,Ar which can take the values 0=FALSE and 1=TRUE. Every clause 
contains K randomly chosen Boolean variables which are connected by logical OR operations (V) and appear negated 
with probability 1/2, e.g. = {xi V Xj V Xk) for K = 3. Because of the OR-conjunction a -fT-SAT-clause is satisfied 
if at least one of the K variables has the correct assignment. In the formula F all clauses are connected by logical 
AND operations (A), 

M 

so all clauses have to be satisfied simultaneously in order to satisfy the formula. For K = 2, i.e. if each clause 
connects only two variables, the problem is easy, and polynomial-time algorithms are known '24]. On the other hand, 
the problem becomes NP-complete for all if > 3 |2J|, so one expects that no efficient algorithm to solve generic 
if-SAT formulas in polynomial time can be found. 

The considerable attention attracted by the random X-SAT problem was initiated about one decade ago, when 
the model was numerically observed |25j to undergo a characteristic phase transition which is parametrized by the 
clause-to- variable ratio a — M/N. For a < 4.26 and sufficiently large system sizes N, almost all 3-SAT formulas were 
found to be satisfiable. For a > 4.26 this behavior changes drastically; the formulas are found to be unsatisfiable 
with a probability approaching one in the thermodynamic limit N oo. Even more interestingly, this transition was 
observed to coincide with a strong exponential peak in the algorithmic solution time of complete algorithms. The 
hardest to solve formulas are thus located close to the phase boundary, and are said to be critically constrained. 

The observation of this phase transition finally led to the application of analytical tools developed in the statistical 
mechanics of disordered systems, since random K-SAT can be mapped to a spin-glass model on a random hyper-graph. 
After the pioneering work by Monasson and Zecchina j^J providing the first analytical approximation to the phase 
transition using the replica method, many efforts were done to improve the analytical understanding. In Ref. p , on 
the basis of a variational approach, a second phase transition was suggested to appear inside the satisfiable phase: For 
very low a, the set of all solutions to a /f-SAT formula was found to be unstructured, with the exponentially large 
number of solutions collected in one large connected cluster. For larger a the solution space breaks into an exponential 
number of clusters. Using the cavity approach, the (probably) exact location of this transition was established recently 
for K ^3. It is given by aa = 3.92 

B. A simpler but similar model: Random /f-XOR-SAT 

A model showing a very similar behavior, but being analytically much more tractable, is given by the random 
if-XOR-SAT problem (in the physical literature initially denoted as iiT-hSAT The difference to K-SAT is that 

the variables appearing in the clauses are connected by logical XOR operations (©) instead of OR. A clause is thus 
satisfied if an odd number of variables is assigned correctly, i.e. to TRUE if the variable appears non-negated, and to 
FALSE if it appears negated. 

The ©-operation is equivalent to an integer addition modulo 2. Using this equivalence we can map each clause to a 
linear equation (modulo 2), and the formula consequently to a coupled set of M linear equations. The solution of this 
system can be easily found in 0{N^) steps. Hence XOR-SAT formulas can be solved efficiently by a global algorithm, 
i.e. by exploiting the global information about the instance and its structure in every step. If we use, however, local 
algorithms like the ones used also for ^ff-SAT, we observe a very similar behavior of both models. 

Again, the system can be conveniently parametrized by a = M/N. The numbers given below are valid for K = 3, 
but the qualitative picture is valid for any K > 3. For a < 0.818, the formula is typically easy to solve, the solution 
space consisting of one large cluster. In the region 0.818 < a < 0.918, the formulas are still satisfiable with probability 
tending to one for — > oo, but the solution state decays into an exponential number of clusters. In addition, there 
are also exponentially many metastable states which prevent even the best local algorithms from fast convergence to 
a solution. For a > 0.918, the system is almost surely unsatisfiable. 

These values were originally calculated using the replica method which is believed to be exact, but still lacks 
a rigorous foundation. A very beautiful result for if-XOR-SAT was recently obtained in two independent works 
|27ll28j |: The results given above, including the ones obtained by one-step replica symmetry broken calculations, were 
reproduced using mathematically rigorous methods. 
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The if-XOR-SAT problem is also interesting from a physical point of view, because it is equivalent to a diluted 
iiT-spin model. Such models are frequently discussed in connection to the glass transition, see e.g. [29j. 

C. Stochastic local search algorithms 

As already mentioned in the introduction, here we are not interested in the solution space structure of iiT-SAT and 
if-XOR-SAT, but in the non- equilibrium dynamics of so-called stochastic local search algorithms (SLS). 

The idea behind these algorithms is that, if a formula is satisfiable, a solution can frequently be found more quickly 
if randomized algorithms are used. In general, these algorithms are incomplete, i.e. they stop once they have found 
a solution, but they are not guaranteed to really find one. Due to their random character, they are also not able to 
prove the unsatisfiability of a formula. In the case where there is no solution the algorithm just runs for ever, or until 
some running-time cutoff is reached. 

Here we mainly concentrate on the walk-SAT algorithm introduced in |22| |. Its most recent implementations are 
available in the SATLIB "s^, and they are one of the best stochastic local search algorithms for random iiT-SAT. The 
algorithm starts with a random assignment to all N variables. Within this assignment, there is a number a^iV of 
satisfied clauses, whereas the other OuN = {a — as) N are unsatisfied. 

In every step, the algorithm selects an unsatisfied clause C randomly and then one of its K variables v* 

• with probability q randomly (zwa/A;-step), 

• with probability 1 — q the variable in C occurring in the least number of satisfied clauses. { greedy -atep) . 

The current assignment of v* is inverted. All clauses containing v* that were unsatisfied before become now satisfied. 
Clauses that were satisfied behave differently for the two models under consideration: For K-SAT, a previously 
satisfied clause becomes unsatisfied iff v* was the only correctly assigned variable in this clause. For ii'-XOR-SAT, 
every previously satisfied clause containing v* becomes unsatisfied. 

These steps are repeated until no unsatisfied clause is left. Then the algorithm has found a solution of formula F 
and stops. As noted earlier the algorithm will run for ever if no solution exists. 

There are variants for the greedy step: The algorithm could also select the variable in C leading to the minimal 
number of unsatisfied clauses ("maximal gain"), or the one minimizing the number of previously satisfied clauses 
which become unsatisfied ("minimal negative gain"). The second case is equivalent to our choice for ii'-XOR-SAT. 
For X-SAT they are different due to the fact that not all satisfied clauses become unsatisfied. 

A completely different heuristic is the GSAT heuristic [sJl which, in the greedy step, globally selects the variable 
leading to the minimal number of unsatisfied clauses. In numerical studies, this selection is outperformed by walk-SAT 
|32J. There also other heuristic variations of walk-SAT and GSAT are discussed. For reasons of clarity we concentrate 
completely on the algorithm given above. We expect, however, that the approximate approach developed in this paper 
can also be extended to more involved cases, as long as the dynamics can be considered as a Markov process. 

A different iteration of variable flips was introduced by Schoning |33|| . He suggested to stop the algorithm after 3 
steps, and to restart it by selecting a new random initial assignment to all Boolean variables. For q — 1, i.e. for a 
pure random walk dynamic, he was able to prove that the worst case solution time goes down from 2^ iterations to 
only (4/3)^ steps, i.e. the algorithm is exponentially accelerated. This simple algorithm shows, up to a refinement 
leading to 1.3303^ steps the currently best known worst case behavior of all SAT- algorithms. 

In the following sections, we will analyze both models for exponential waiting times and for an exponential number 
of random restarts. We will concentrate on formulas which are satisfiable, i.e. on variables-to-clauses ratios inside 
the satisfiable phase of the model under consideration. In the unsatisfiable phase there are no solutions, thus the 
algorithm cannot terminate by construction. 

III. NUMERICAL RESULTS ON THE BEHAVIOR OF WALK-SAT 

Now we present some numerical observations on the behavior of walk-SAT applied to randomly generated satisfi- 
ability formulas. We look to iiT-SAT as well as to iiT-XOR-SAT, and we mainly concentrate on the solution times 
needed by walk-SAT, and the dynamical evolution of the number of unsatisfied clauses while the algorithm is running. 
Explaining these observations will be the final aim in the following sections. 

A. Random K-SAT 

Let us start with random K-SAT. At first we realize that the running time heavily depends on the ratio a = M /N 
of clauses to variables. Let us concentrate on the case K = 3 and q = 1 first, i.e. only walk-steps are performed. For 
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small a negating one variable in an unsatisfied clause rarely causes other clauses to become unsatisfied. Up to a critical 
threshold — 2.7 a solution is found in a median time growing linearly with N, above this point running times 
grow exponentially, see Figs. QandlSl This observation does not depend on the fact whether we use the algorithm 
with or without restarts. In the following we measure all running times in the number of Monte-Carlo sweeps (MC 
sweeps), i.e. a single step of the algorithm leading to the negation of one variable is counted as At — 1/N. During 
a time interval of length one, every variable becomes thus negated on average once. Note that in this representation 
linear solution times lead to a constant number of MC sweeps, whereas exponential iterations of walk-SAT correspond 
to exponentially many MC sweeps. In Fig. |3| we show a histogram of the resolution times inside the exponential 
regime. Obviously, this distribution can be well described by the mean of the logarithm of the running time. For such 
an exponentially dominated distribution this is equivalent to characterizing it by the median, whereas the average 
running time would be dominated by exponentially rare events with exponentially longer resolution times. 
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FIG. 1: 3-SAT: Dependency of the running time of walk-SAT without restarts on the ratio a of clauses to variables. 
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FIG. 2: 3-SAT: Average number a„ of unsatisfied clauses per variable with sample size N = 50000. Initially this energy density 
quickly decreases. For a < ~ 2.7 it becomes zero after a finite time, for larger a a non-zero plateau is reached. 
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FIG. 3: 3-SAT; Histogram of the logarithm of the running times of walk-SAT without restarts for a — 3.5 and TV = 100. 



The algorithm starts with an extensive number of unsatisfied clauses, and stops when their number reaches zero. To 
characterize the search process we therefore look at the behavior of Q:«(t), which is given as the number of unsatisfied 
clauses per variable. We can think of it as an energy density of the system of the N variables. In a randomly drawn 
starting configuration of the Boolean variables Xi, « = 1, N, there are on average 1/8 of all clauses unsatisfied, we 
thus have almost surely au(t = 0) = — a/8. Concentrating first on the linear time behavior, i.e. to finite MC 
times, it is convenient to work with large systems, 1. These show a good separation of linear and exponential time 
scales but also minimize the influence of fluctuations. Numerically we find, in dependence on a, the following behavior: 

• For a < a solution is found after a finite number of MC sweeps, i.e. becomes zero at finite MC times. 
This solution time grows with a, and diverges once we approach the dynamical threshold ad- 

• For a > ad the energy density au{t) initially decreases and quickly equilibrates to a non-zero plateau (Fig. [5J). 
For larger times fiuctuates around its plateau value, as can be seen for smaller system sizes, cf. Fig0] 
Eventually, and only if the formula is satisfiable, one of these fluctuations is large enough to reach au{t) — 0. 
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FIG. 4: After the initial decrease au fluctuates around its plateau value. Two different system sizes are shown. For the smaller 
one with N = 150 a fluctuation after about 145 MC-Steps was large enough to reach a solution of the formula. 
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This behavior explains the origin of the title of the paper: For a > ad-, the system equilibrates to a non-zero number 
of unsatisfied clauses, and only fluctuations around this equilibrium lead the dynamics to satisfying assignments, and 
the algorithm stops. Such macroscopic fluctuations appear, of course, only with exponentially small probability, giving 
rise to exponential solution times. 

This observation leads to an obvious way of improving the algorithmic performance: We may choose a better 
heuristic having a lower equilibrium number of unsatisfied clauses. Exactly this is achieved by introducing a fraction 
q > of greedy steps, see Fig. |S1 where the plateau energy is determined as a function of q for two different values 
oi a > ad- We can see an minimum in the plateau energy for high values of q- The dynamical threshold itself also 
changes slightly and has a maximum at q ~ 0.85. There formulas up to a ~ 2.8 can be solved in linear time. 




0.05 



FIG. 5: 3-SAT: Plateau energy for a 
The plateau energy is minimal for q ■ 



• 3.5 and a = 3.0 depending on the fraction q of greedy steps performed by the algorithm. 
0.95 resp. q = 0.85. 



B. Random if-XOR-SAT 

A qualitatively similar behavior can be observed for random _K'-XOR-SAT, for K = 3 and q = 1 (pure walk 
dynamics). The main difference is of a quantitative nature: the dynamical threshold marking the onset of exponential 
solution times is located at ad — 0.33. We therefore do not repeat the figures given for random 3-SAT, but the 
corresponding numerical data can be found in the following sections in comparison to analytical results. 

IV. A RATE-EQUATION APPROACH TO THE LINEAR-TIME BEHAVIOR 



The main idea of the analytical approach presented in this section is to characterize each variable only by the 
number of satisfied and unsatisfied clauses it is contained in. We subdivide the set of all N Boolean variables into 
subsets of Nt{s,u) variables belonging to s satisfied and u unsatisfied clauses, for a randomly selected variable the 
numbers s and u are thus taken with probability pt{s,u) = Nt{s,u)/N. The numbers Nt{s,u) and thus also the 
probabilities pt{s,u) are changed by the action of walk-SAT, but for every single variable s + u remains constant as 
it counts the total number of clauses containing this variable. 

From these quantities we can, in particular, calculate the total number of unsatisfied clauses Nau{t). Taking into 
account that by summing over variables every clause is counted K-fold, we find 



au(t) = — 



(2) 



where {■)t — J2s denotes the average over the distribution pt at MC-timc t- 

The algorithm does not select variables according to pt{s,u), but selects first an unsatisfied clause C* and then, 
according to the chosen heuristic (greedy or walk step), one of the variables v* in C* is flipped. The probability that 
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variable v* belongs to exactly s satisfied and u unsatisfied clauses is denoted by pl^^^^\s,u)^ and can be calculated 
from pt{s, u) under the assumption of independence of neighboring sites, i.e. we assume that the joint distribution for 
three variables being in one unsatisfied clause factorizes. This assumption, which we will exploit more frequently, is 
the main approximation we apply in the analytical approach, and it allows us to describe the full dynamics in terms 
of pt(s, u). It is strictly valid only for the initial configuration of the dynamics, but as we will see below, it can give a 
good approximation also for larger times. 

For a walk step, variable v* is randomly selected in C*. There are uNt{s,u) possibilities for selecting a v* which 
appears in s satisfied and u unsatisfied clauses. By normalization we thus find the following selection probability: 

upt{s, u 



--: p\ '{s,u) . 



(3) 



For a greedy step, the only random choice is the selection of the unsatisfied clause C*. Then the variable v* is 
selected which appears in the smallest number s of satisfied clauses among all K variables in C*. If there is more 
than one variable with the same minimal s, then one of them is chosen randomly. Applying the independent-site 
assumption, and using the Heavyside function imder the convention 8(0) = 1/2, we find forK = 2 

'{s.u) = ^ p't\si,Ui)p'i^\s2,U2) [S(si,ui},(s,u) ■ 'S>is2 - Si) + S^S2,U2).{s,u) ■ Qisi - S2)] 



(flip-2~ greedy) ^ 
Pt 



2p[-\s,u)Y,p[-\s\u')e{s'-s) 



Pt is,u) 



s-1 



2-E [P^t''\^,u') + 2j2p\-\s',u') 



(4) 



and similarly for K — 3 

{s,u) - 3p[''\s,u) p[''\s',u')p["\s",u")[eis'~s)eis"-s) + l/USs,s'Ss 



(flip-S-greedy) 
Pt 



s' ,u' ,s" ,u" 



3p|"''(s,u) 



n 2 



1-5: l/2p(")(.,.') + Epi"^(^''"') 



ti'=0 



s'=0 



+ l/4p(")(s,K) 



.m'=0 



(5) 



Note that the contribution ^s,s"5s,s"/12 is a correction term for the case that s = s' = s" which results from the 
convention 8(0) — 1/2. 

For the full algorithms, these two different steps appear with probabilities q and 1 — 9. The selection probability 

Pt 



^(/'«p) jg i\am given by the linear combination of the two cases. 



p^/''P\s,u) = gpi^''"''-'(s,M) + (1 - g)p|^"^-^-«'^^^'*^'(s,u) 



(6) 

At this point, GSAT-fike heuristics could also be included, e.g. by taking pj-^''^^ (s, u) ~ u'pt{s,u) with 7 > 1. This 
would guarantee a preferential selection of variables belonging to a high number of unsatisfied clauses. Here we do 
not consider this additional possibifity. 



A. A Poissonian estimate for the pure walk dynamics 

For a moment we concentrate on the simplified case where the algorithms uses only walk steps, i.e. to q = 1 |35l |. We 
further assume that s and u are, for arbitrary times, distributed independently according to Poissonian distributions: 

Pt{s,u)=e — (7) 

s\u\ 

Again, this assumption is valid for t — 0, whereas deviations appear for larger times. On average each variable is 
contained in Kas{t) = K{a — a„(i)) satisfied and Kau{t) unsatisfied clauses. If we plug this ansatz into © we get 
for an algorithm without greedy-steps 

Pt {s,u)-e s\{u~l)\ 

which again is a product of Poissonian distributions of s and u — \. Hence, on average, the negated variable v* is 
contained in Kas{t) satisfied and Kau{t) -\- 1 unsatisfied clauses. 
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1. Random K-XOR-SAT 

We continue by first considering the analytically simpler case of i^T-XOR-SAT. There, by flipping w*, all s satisfied 
clauses containing v* become unsatisfied, whereas all u unsatisfied ones become satisfied. The expected number of 
unsatisfied clauses iVj"' changes during one step as 

ATVt^"^ = -{Ka^{t) + 1) + Kasit) = Ka - 2Kau{t) - 1 . (9) 

Concentrating on the average dynamics, which is followed with probability approaching one in the thermodynamic 
limit iV CX3, we have -/Vj""* = Nau{t). Measuring the time t in MC sweeps, every algorithmic step contributes a 
At — l/N, and the difference on the left-hand-side of Eq. can be replaced by a time derivative (if N ^ 1), 

au{t) = Ka- 2Kau{t) - 1 . (10) 

If we solve this differential equation we get for the the energy density of X-XOR-SAT 

= ^(^«-l + Ce-2^*) (11) 

In the typical starting configuration half the clause are satisfied and half are not, i. e. au(Q) — a 12. So we finally get 

au{t) = ^{Ka-l + e-^^') (12) 

In Fig. El the results for different a are compared to numerical simulations. For small times both curves coincide, 
because correlations have not yet built up. Later the algorithm reaches a lower density of unsatisfied clauses than the 
Poissonian approximation would suggest. 

We also see that there are two different regimes. For small a the energy decreases quickly to zero - reaching zero at 
finite MC times with non-zero slope. For larger a, the number of unsatisfied clauses first decreases, but then reaches 
a positive plateau value. Both regimes are separated by a dynamical threshold which is located at 

«<i = ■ (13) 

In the special case K — Z we thus find — 1/3 which coincides perfectly with our numerical findings. Note that 
for a < Q!d, the algorithm thus constructs a satisfying assignment already after a linear number of algorithmic steps. 
Above ad, the algorithm does not reach a solution in linear times with a probability tending to one in the large- A'^ 
limit. 

2. Random K-SAT 

For random K-SAT we can get a similar estimate. We have to take into account that now satisfied clauses do not 
necessarily become unsatisfied if a contained variable is inverted. For each iiT-SAT clause C there is one unsatisfying 
and 2^ — 1 possible satisfying assignments. The only case where the clause becomes unsatisfied by flipping a single 
variable v* is the assignment where this variable is the only correct assigned variable in C. If we assume independent 
clauses this happens with probability 1/(2^ — 1), so we get for the expected number of unsatisfied clauses 

AN^-^ = -{Ka^it) + 1) + ,^^Ka,{t) = - ^7^a„(0 - 1 (14) 

Going for N oo again to continuous-time quantities and differential equations, we find 

Ka 2^K 



with solution (the initial condition is given by au{0) — 

au{t)^^(^Ka+[2^ -1 
cf. Fig. |S1 For random K-SAT we thus find the Poissonian estimate 



(16) 



2^-1 

a, = — ^ (17) 

for the onset of exponential solution times. In the special case if = 3 we get ad = 7/3 which is smaller than the 
numerical value 2.7. 
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B. Rate equation for the walksat algorithm 



We have seen that aheady a simple Poissonian approximation is able to qualitatively reproduce the behavior of 
walk-SAT for linear solution times, at least for a pure walk-dynamics without greedy steps. There were, however, 
some systematic quantitative deviations, in particular for the case of random if-satisfiability. It is thus necessary to 
go beyond the simple Poissonian ansatz for pt{s, u), i.e. for the time-dependent fraction of Boolean variables belonging 
to exactly s satisfied and u unsatisfied clauses. Our aim is to work only with these quantities, i.e. we still have to 
keep the approximation that the joint distribution for variables within one clause factorizes. This approximation of 
independent neighboring variables was already used in the beginning of this section, when p^-'^'^P^ (s, u) was derived, 
cf. Eqs. M . 



As above, we denote by Nt (s, u) — Npt{s, u) the expected number of variables that occur in exactly s satisfied and 
u unsatisfied clauses at step t. Our algorithm starts at < = and each step counts as At. We follow the procedure in 
[T^ to describe the typical evolution of the algorithm. 

A variable v* with s* satisfied and u* unsatisfied clauses is flipped. This occurs with probability p[-^^^^\s* ,u*). 
Three different processes contribute to Nt+Atis,u): 

• Contribution by v* : The s* satisfied clauses become unsatisfied, whereas the u* unsatisfied clauses become 
satisfied. The number of variables characterized by s* satisfied, u* unsatisfied clauses is thus decreased by one, 
the number of variables in u* satisfied and s* unsatisfied clauses is increased by one. This means that the 
expected number of variables Nt{s*,u*) is decreased hy p\'^^^'^\s* ,u*), and Nt{u*,s*) is increased by the same 
amount. 

• Neighbors of v* in previously satisfied clauses: The flipped variable v* occurs, on average, in {s)\^^^^^ previously 
satisfied clauses, where i-)'"/''^^'' = J2s ui')Pi^'^^^\^^ Since each clause contains K variables, and since random 
formulas are locally tree-like, there are on average {K — \){s)[^^^^^ neighbors in previously satisfied clauses. 

All these clauses become unsatisfied. This means that for each other variable contained in these satisfied 
clauses, the number of satisfied clauses goes down by one, the number of unsatisfied clauses is increased by one. 
Taking into account that, according to the assumption of independent neighbors, these belong to s satisfied 
and u unsatisfied clauses with probability spt{s,u)/ {s)t, we conclude that Nt{s,u) is, on average, decreased by 
[K — ''^(i')'"'^ ■ One out of these s satisfied clauses is the one with the flipped variable v*, so the decrease 

of Nt{s, u) is now added to Nt{s — 1, w -I- 1). 

• Neighbors of v* in previously unsatisfied clauses: Analogously to the discussion in the last item one gets contri- 
butions to Nt{s, u) for variables v which occur together with v* in unsatisfied clauses. 

Combining these processes we get an evolution equation for the expected numbers Nt{s,u) of variables appearing 
in exactly s satisfied and u unsatisfied clauses at time t: 



1. Random K-XOR-SAT 



Nt+At{s,u) 



Nt{s,u)-py''^>{s,u)+i/''P>{u,s) 




(18) 



Setting again At = l/N and replacing differences by derivatives in the thermodynamic limit. 



Nt+At{s,u) ~ Nt{s,u) 



N {pt+At{s, u) - pt{s, u)) 
Pt+Af{s,u) ~pt{s,u) 



At 
-T:Pt{s,u) 



(19) 



we get a set of differential equations for pt{s, u): 
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Pt{s,u) 



+ {K-l){s) 



spt{s,u) {s + l)pt{s + l,u-l) 



l,u+l' 



(20) 



In the typical initial configuration the probability of a clause to be unsatisfied is 1/2 and so po{s,u) is given by 
equation (TJ with as{t) — Q!„(t) = 1/2. 

By numerical integration of H20|) we can find the typical trajectory for an algorithm with given p[^^^^\ The results 



for an algorithms without greedy-steps (e.g. p[^^^^\s,u) = p)"'{s,u)) for different values of the ratio a = M/N are 
shown in figureEl They are compared with numerical data obtained from single runs of the algorithm on a large single, 
randomly selected sample formula. As we can see the assumption of independent variables is suitable to describe the 
behavior of the algorithm in this model. We also see that the dynamical threshold ad, which marks the onset of 
exponential solution times, is again given by 1/3. 




'"OTJ'D'DTJ 5 O D C CTCr(3'<rcrO"O-croT3T3"b"D"DT)'D'0 D D O ij CTCTCffCCr 
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FIG. 6: 3-XOR-SAT: Typical number of unsatisfied clauses (divided by A'^), as a function of the MC time t, for walk-SAT 
with walk steps only. Different ratios of a are shown, from top to bottom we have a = 1.5, 1,0.75,0.5,0.35,0.2. The dashed 
line is obtained by numerically integrating equations 12011 . the full line gives the Poissonian approximation. These results are 
compared to the evolution for a (random) single 3-XOR-SAT instance with = 50000, as given by the symbols. 



When analyzing the algorithm including a fraction of greedy steps we see that the assumption of independent 
variables is indeed very crucial. In figure [T] we show the result of the numerical integration now using p^^^^^P^ (s,u) = 
qpifiw-waik) u) + (1 — q)p'"/^^^ grt^edy) given by eq. ©. Since in this case the probability of a variable to 

be flipped depends on its neighbors naturally correlations between neighboring variables built up. This explains why 
the ansatz does not give a good quantitative approximation when greedy steps are included. 



2. Random K-SAT 



Let us now consider the slightly more involved case of random K-SP^T. As already discussed in the context of the 
Poissonian approximation, we have to take into account that flipping a variable does not necessarily unsatisfy all 
previously satisfied clauses the variable is contained in. We assume again that the probability of such a clause to 
become unsatisfied is clause- and time-independently given by its naive average /i — 1/(2^ — 1). Similar to XOR-SAT 
we get three contributions to Nt+Atis, u), one coming from the flipped variable itself, two from neighbors in previously 
satisfied (resp. unsatisfied) clauses. 
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FIG. 7: 3-X0R-SAT;Influence of greedy steps on the behavior of the energy density at a = 0.75. As above the dashed line 
is obtained by numerically integrating equation Il20t after plugging in eq. Q and using A*' = 1/At — 50000. The dotted line 
shows the evolution of a (random) single run of the algorithm with N — 50000. From top to bottom we have g = (i. e. no 
greedy steps), q = 0.5, q = 0.7, q = 0.9. The energy plateau decreases with q but due to correlations the integrated equation 
does not fit the numerical data. 



• If the flipped variable v* appears in exactly s* satisfied and u* unsatisfied clauses than, as in XOR-SAT, 
Nt{s*,u*) is decreased by one. This happens with probability pj''^'*^'' (s*, m*). 

By hipping v* , all u* previously unsatisfied clauses become satisfied. Out of the s* previously satisfied clauses, 
a random number k remains satisfied, s* — k become unsatisfied, i.e. Nt{u* + fc, s* — k) is increased by one. 
There are (*^) possibilities for selecting these k clauses, each one appearing with probability fi^ "'^(l — /i)'^. 
The total contribution by v* is obtained by summing over all possible values of k. 

• The contributions from neighbors of the flipped variable are similar to XOR-SAT. The only difference is that 
the average number of neighboring variables on satisfied clauses becoming unsatisfied is now fi{K — 1) • {s)''/^^^\ 

Combining all contributions we derive a set of differential equations for the probability distribution of the variables: 

, K -I I (flip) ( spt{s,u) {s + l)pt{s + l,u-l] 



+{K iKu)^'^ ^ {u + i)Ms-i,u + i) \ 

V Wt {u)t J 

Also these equations have to be solved numerically. The results for the most interesting case K — ?> [i. e. qu — 
— 1) = 1/7) for different values of a are shown in figure |HI Even if they are quantitatively much more accurate 
than the Poissonian approximation, there are some systematic deviations compared to direct numerical simulations. 
The curves match the simulation results for small times. Then correlations between neighboring variables build up, 
violating our basic assumption. However, for larger times both curves match again, because the same distribution 
Pt{s, u) is reached. This can be seen in the histogram in figureEl At t = 1.4 the distributions pt{s, u) as derived from 
the rate equations or evaluated numerically are different, while after i = 6 they again have almost converged to the 
same distribution. 

This observation allows for a precise determination of the dynamical threshold ad which marks the transition from 
typically linear to exponential algorithmic solution times needed by walk-SAT: The transition is defined by the point 
where the expected energy density au{t) asymptotically does not decrease to zero any more. In figure [TUl one can see 
that, for if = 3, this happens at — 2.71. 
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FIG. 8: 3-SAT: Running time of the Walksat-Algorithm with walksteps only. Different ratios of a are shown, from top to 
bottom we have a = 4.0, 3.5, 3.0, 2.85, 2.7. The dashed hne is obtained by integrating equations I2H with A'' = 1/At — 50000. 
The symbols show the evolution of a (random) single run of the algorithm with A'^ = 50000. The solid line shows the analytical 
solution 1121 of the Markov equation assuming a Poissonian distribution pt{s,u) for all times t, for clarity only a — 4.0 and 
2.85 are depicted. 
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FIG. 9: 3-SAT: Distributions pt{s,u) for t=1.4 (left) and t=6. The results are shown as a function of s, the different curves 
correspond to u = 0, 1, 2, 3 (from top to bottom). One can see that numerical and analytical results differ for t = 1.4, whereas 
they are very close for larger times corresponding to the energy plateau. 



As already observed for XOR-SAT, the influence of greedy steps cannot be reproduced very well. In figure ITTI we 
show results for three different g at a = 3.5. The energy density obtained by assuming independent variables gives 
a too low energy density. For a = 0.9 it even decreases to zero at finite times, contrary to our numerical results (cf. 
Sec. IIII A|l . We therefore conclude that the independent-neighbor approximation is only suitable for the case without 
greedy steps, where less correlations can be built up. 



V. LARGE DEVIATIONS AND THE EXPONENTIAL-TIME BEHAVIOR 



In the last section, we have characterized the typical linear-time behavior of walk-SAT on satisfiable, randomly 
generated /-C-SAT and A'- XOR-SAT formulas. We have, within some approximation assuming independent neighbors, 
calculated the trajectory which is followed by the system in terms of the probabilities pt(s, u) that a randomly selected 
variable belongs to exactly s satisfiable and u unsatisfiable clauses. "Typical" behavior means in this context that 
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FIG. 10: 3-SAT: The left curve shows the (hnear) solution time after which the expected energy density au{t) (from rate 
equations) reaches zero, as a function of a. This time diverges logarithmically at ad- For larger a, a non-zero energy plateau 
is found, which is shown in the right curve. 
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FIG. 11: 3-SAT: Influence of greedy steps at a = 3.50. As above the dashed line is obtained by numerically integrating equation 
after plug ging in eq. @. From top to bottom we have q = 0.5, q — 0.7, q = 0.9. The symbols show simulation data for 
the evolution of a single run of the algorithm with A'' = 500000. 



the trajectory is followed with probabihty tending to one in the thermodynamic limit N —^ oo. 

We have seen that there exists some (model-dependent) dynamical threshold ad, below which the algorithm reaches 
zero energy, i.e. a solution of the SAT formula, after linear time. Above a^, the typical trajectory, however, shows a 
fast equilibration towards a non-zero plateau value au{t — > oo). The walk-SAT algorithm is no longer able to construct 
a solution in linear time, i.e. we expect the solution times to become exponentially large. The approach of Sec. IIVI 
thus fails to describe the final descent of the energy to zero. 

In Sec. mil we have seen that, for smaller system sizes, the number of unsatisfied clauses fluctuates around its 
expected value. Eventually these fluctuations become large enough that the system by chance hits a solution - 
fluctuations are the way walk-SAT finally succeeds constructing a solution. However, we expect these fluctuations to 
be exponentially rare, i.e. we have to wait almost surely an exponentially long time to really touch a solution. 

This section is dedicated to characterizing these fluctuations, or, more precisely, to calculating the probability 
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P(au(0) otu{tf) = 0) that the system reaches au{tf) = after some finite time t/, under the condition that 
the system started initiaUy with some Q!„(0). This probabihty gives all important information about the dominant 
exponential contribution to the typical running times tsol ~ e^"^ beyond a^: 

• For walk-SAT without restarts, we start from a typical initial condition q;u(0) = 1/2^ (for X-SAT) resp. 1/2 
(for /C-XOR-SAT), and we wait until the system reaches a„(i) = 0. This does not happen for finite times, i.e. 
the solution time is given by the exponent 

T ~ - lim lim ^ lnP(a„(0) ^ a„(t f ) = 0) (22) 

t/— >oo JV— >oo A" 

The solution time is thus, in general, exponentially large in N . Note that the order of limits in the above 
expression is relevant, there tf measures only a finite MC time scale. With interchanged limits, the right-hand 
side would vanish, since the algorithm finds a solution after exponential times tf with probability one. 

• For walk- SAT with restarts, the situation changes slightly. Let us assume that the algorithm stops every tfN 
walk-SAT iterations and re-initializes the variables randomly. In this case, we have to take into account two 
distinct rare events: First, the starting point may be close to a solution, i.e. a„(0) is atypically small. This 
happens with probability p{au{o)) ~ e-'^«(""(o)) where s(ati(0)) is the micro-canonical entropy for the energy 
density a„(0). p(a„(0)) tends to one for the typical starting point discussed in the previous item, and becomes 
exponentially rare for smaller initial energies. This may be balanced by the fact, that finding a solution after some 
given time tf becomes more probable for smaller initial energies. From the probability of finding a solution after 
a single restart, /9(a„(0)) maxo<t<ty. P(ati(0) — > au{t) — 0), we can read off the number of restarts tgol = e^'^ 
needed to find a solution with high probability: 



max lim — In 

au(0) N^oo N 



p{au{0)) max P(a„(0) ^ a^it) = 0) 

0<t<tf 



(23) 



Our aim is thus to calculate the large-deviation fimctional determining P(q:„(0) ctu{tf) — 0)- As we will see, this 
can be done only within the Poissonian approximation, i.e. throughout this section we assume 

,.(.,.) =e--&^m^^^ (24) 
s'.u'. 

with au{t) -f as{t) = a being time-independent. 

A. Random A'-XOR-SAT 

Here we discuss only an algorithm without greedy-steps, where the above approximation works reasonably well. 
Therefore p^'^^^'^\s,u) is given by equation ((HJ. The number of unsatisfied clauses in a formula at time t is given by 
Na^ (t) . This number changes by Ae = s — u in the next step if a variable with u unsatisfied and s satisfied clauses 
is flipped. The probability P(Ae) of a given energy shift Ae in a single step is consequently given by 

oc 

Pt{Ae) = p[^''''\s,u)SAe,s-u 

s,u—0 
s,u— 

_j,^ {Kas{t)nKau{t)r u , 
" s\u\ Xa„(t)'^^'^-" 

_j, JKaMY{Ka^{t)Y\ 

= > e 1— 71 OAe,s-M'-i • (25) 

^ — ' s\u \ 

s,u'—0 

The probability Pat,* (AS) of a change of the number of unsatisfied clauses by AE after AT steps is given by the 
convolution of the single-step probabilities. For AT = N6t with small St ^ 1, the energy density a„(i) and thus 
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Pt(Ae) are almost time- independent, so we get in Fourier space 



AT 

AT ' ' 



\s,u—0 



= {exp{~Ka + Kas{t)e-'^ + Kau{t)e'^ (26) 
Switching again to intensive quantities, we have AE = Nau(t)6t and thus 

^ exp {NSt {ilauit) - Ka + K{a - aAt))e^"^ + KaAt)e'^ + H)} (27) 

as the probabihty of the algorithm for getting from energy density = E/N at time t to energy density {E— AE) /N 
at time t + AT. To calculate the transition probability between au(0) and arbitrary aAtf) after linear time tfN we 
write as a composition of many small intervals 6t. We then get that transition probability by integrating over all 
possible paths a„(i) going from au(0) to Q!„(ij). By this step also the conjugate variable I becomes a time-dependent 
function l(t), 

P(a„(0) ^ a„(t/)) = / "^'\auit) f Vl{t)e^p[~N f dtCil{t), aAt), aAt))] , (28) 

Ja„(0) J I Jo ) 

where the Lagrangian C is given by 

C{l(t),auit),auit)) = ~il{t)au{t) + Ka - K{a - auit))e-'^^^^ - Kau{t)e'^^^^ - il{t) (29) 

We can replace the integral by its saddle point in the thermodynamic limit. Since l{t) is not a dynamic variable {l{t) 
does not appear in the Lagrangian) we find 

= — = iaAt) - iK{a - a„(i))e-*'(*) + iKau{t)e'^^''> + i ■ (30) 
The saddle point in (t) is given by the Euler-Lagrange equation 

= - 1^ = m + ^e-^'W - ife^'(*) . (31) 

at auu ouu 

We are, in particular, interested in trajectories leading to a solution of the formula, i. e. trajectories starting at some 
a„(0) and going to a„(i/) = after some given final time i/. This results in a set of two coupled first-order non- 
linear differential equations for au{t) and l{t) with two boundary conditions given for au(t), and none for l{t). By 
substituting K{t) = e*'^*-' the equations read 

At) 

At) ^ KA{t)-K (32) 

A trivial solution of the second equation, K{t) = 1 leads to d„(i) — —1 — Ka.At) + K{a — au{t)) which is exactly 
the equation for the typical trajectory given by (|10() . Indeed we have C{k = l,au,au) = 0, so this trajectory has 
probability 1 in the thermodynamic limit. 

This solution is, however, not stable since we have At) < for k(0) < 1 and k{t) > for k(0) > 1, i.e. the 
trajectory deviates from the typical one once k deviates from 1. We can, however, solve the equations for XOR-SAT 
in this Poissonian approximation and get 

_ 1 + Ae^^* 

At) - Y^j^t 

aAt) = a„(0)e-2^* 



1 - A^e^^* 



1-^2 



dT -1 + Ka T,^J e-^^(*'") " . (33) 
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FIG. 12: 3-XOR-SAT at a = 2/3: Energy densities au{t) for various initial conditions au{0) and solution times tf. The system 
first equilibrates to a plateau being independent on the initial condition, and finally solves the SAT formula by a macroscopic 
fluctuation. 



In principle also the integrals in the second expression can be carried out analytically, but we failed to find a compact 
representation of the result. The solution still contains the unknown constant A which has to be adjusted to meet 
the final condition a„(t/) = 0. We have observed that A is slightly smaller than e^'^^^f , but it is easier to determine 
tf{A) than its inverse A{tf). 

The trajectories show an interesting behavior, cf. figure^] After about 1 MC-sweep the energy reaches a plateau 
independent of the starting energy density au{t). The plateau value is the same as given by the typical trajectory and 
almost independent of the time tf where the solution is found. The energy drops to suddenly about 1 MC-sweep 
before tf. This is similar to the qualitative picture we observed numerically in Sec. Illll The system first equilibrates 
and then, by means of an exponentially improbable fluctuation, reaches zero energy, cf. fig. 0] The fluctuations which 
are present in the numerical data cannot be seen in the analytical curve due to the fact that the latter one gives an 
average over all possible trajectories under the condition that q;„ = is reached for the first time at tf, so only the 
very last fluctuation leading to the solution is common to all possible numerical trajectories. 

To calculate the probability that the algorithm, starting at some q;„(0), finds a solution after time tf we have to 
calculate the action 



5(/;(K(t),a„(t),d„(t))) 



dtC{K{t), a„(t), d„(t)) 

/ a — a (t) 

dt{-log{K{t)){du{t) + l) + Ka-K 

V i^it) 



Kau{t)n{t) , (34) 



using solution (|33|l . The evaluation is simplified by plugging in the saddle-point equations in order to eliminate (t) , 



S{C{K{t),auit),aum = K j\t(^~\og{K{t)){^a^{t)Kit) + ^^-^^) + a-^^^^-auit)K{t)] (35) 



Kit) 



The results are shown in FigElfor different values of the initial condition au{0) and different solution times. For the 
typical initial condition q;m(0) — a/2 we find a monotonically decreasing function which has practically reached its 
asymptotic value for tf > 1. From equation H28() . the probability that the algorithm finds a solution is given by 



P(a„(0) -> a„(ty) = 0) = exp{-iV5}, 
and the typical solution time of the algorithm without restarts is given by Eq. (|22|l . 



tsoi = lim e 

t/-»oo 



NS 



(36) 
(37) 
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FIG. 13: 3-XOR-SAT at a = 2/3: Action 5 as a function of the resolution time tf, for initial conditions Qfu(O) = 
0.02, 0.06, 0.1, 0.34, from bottom to top. The inset shows the logarithm of the predicted solution time for the same val- 
ues au(0), but now from top to bottom. 



We also observe that, for smaller than typical au{0), the action shows a pronounced minimum for small solution 
times. This minimum corresponds to trajectories which start close to a solution (small a„(0)) and go more or less 
directly to this solution (small tf). As discussed in the beginning of this section, it may be possible that the algorithm 
can profit from this by using random restarts. Taking the entropy as calculated in Ref. |26j |. we however find that the 
minimum in S is over-compensated by the small entropy of low-energy starting configurations, cf. the inset of Fig. 
1131 where T{au{0),t f) — —1/N ln{p{au{0))¥) is presented. The minimum of t is still found for the typical starting 
configuration, and it is related to the typical running time by tsoi = mine^'^'^""''^^'*-'^. Here it coincides with the 
solution time without restarts. 




FIG. 14: 3-XOR-SAT: Solution time tsoi for Schoning's algorithm (only walk steps, random restarts after tfN — 3N steps) 
measured as the number of restarts, as a function of a. The analytical result is given by the full line. Numerical data for 
A'' = 30, 50, 70 (dots, squares, diamonds, lines are guides to the eyes only) seem to indicate much smaller solution times. The 
inset shows, however, that there are huge finite size effects for a = 0.4,0.42 (crosses, stars). The analytical estimates for the 
corresponding solution times are ln{tsoi)/N ~ 0.0061,0.0099. 



In Fig. 1141 the resulting solution time is compared to numerical data obtained using the algorithm with random 
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restarts after 3N iterations. Due to the exponential behavior only small systems up to iV = 70 could be investigated in 
the full satisfiable region. The resulting running times seem to be much smaller than the analytical predictions. There 
are, however, huge finite size effects. In the inset we show numerical data for a = 0.4 and 0.42, where the exponent 
is still small enough that systems up to iV = 1000 can be easily solved. It is obvious that even from such large 
systems the asymptotic running time cannot be reasonably estimated. On the other hand, the qualitative behavior is 
well-represented by the analytical curve, in particular the sub-linear slope close to the threshold. The analytical curve 
suggests \ntsoi ~ y/a — ad- Another interesting observation is that, at the SAT/UNSAT threshold ac = 0.918, the 
analytical prediction for the solution-time exponent is 0.249, which is smaller but quite close to Schoning's rigorous 
upper worst-case bound ln(4/3) ~ 0.288. 



B. Random K-SAT 



The same type of analysis can be done for the case of random K-SAT. The main difference is, as mentioned 
already in Sec. IIVI that a satisfied clause does not necessarily become unsatisfied when one of its variables is flipped. 
This happens only if the clause is satisfied only by the variable to be flipped, which is one of the 2^ — 1 satisfying 
assignments to this clause. We again use the assumption that the variables in one clause are uncorrelated and assume 
that a clause becomes unsatisfied with probability /i = 1/(2^ ~ !)■ In analogy to the discussion above we conclude 
that the probability that a variable flip leads to a given energy change Ae is given by 



Pt(Ae) 



OO 

E 

s.'ti— 



Pi 



{s,u) 



k=0 



(38) 



where k sums over all possible numbers of clauses which become unsatisfied in the considered algorithmic step. 
Concentrating again on the pure walk algorithm without greedy steps, i.e. on g = 1, we can go through the same 
procedure as for iC-XOR-SAT. The transition probability from some initial to sonic final density of unsatisfied clauses 
is, in the Poissonian approximation (|24(l given by the path integral 



P(a„(0) ^ a„(t/)) = / Vauit) VK{t)cxp\^N 5tC{K{t),au{t),au{t)) 

Ja^{0) J i Jo 

with the modified Lagrangian 

C{K{t), au{t), au{t)) = -(1 + auit)) H<t)) + Ka-^ K{a - ( 1 - A* + -^ir ) - Kau{t)K{t) 



(39) 



(40) 



The saddle-point equations are given by 

a„(t) 



-1 - Kau{t)K(t) +Kn- 



K{t) 



k{t) = Kn'^{t)- K{1- ^i)Hi{t)- K^i . 
Their solution dominates, for iV — > oo, the path integral H39|l and is given by the generalization of Eqs. I|33(l : 
1 + ^Ae^^* 



(41) 



Kit) = 



au{t) 



\~ Ae 

a„(0)e" 
ft 



2Kt 



-{l+fj,)Kt 



1 - ^e(i+^)^* 1 + fiAe 



{l+tJ.)Kt 







dr — 1 + ^Ka 



I- A 1 + /iA 

1 - ^e(i+^)^* ■ 



1 + /iyle(i+^) 



Kt 



Kt 



(42) 



The results for the typical trajectories leading to a solution after some given final time tf are presented in Fig. 
p5|l . They show the same qualitative behavior like if-XOR-SAT with a slightly slower convergence towards the 
equilibrium due to the reduced exponential factor e~(^+^)^*. Also the action calculated for the trajectories shows a 
similar behavior like for i^T- XOR-SAT, cf. Fig. ^| The exponentially dominant contribution to the typical solution 
time is again given by tsoi ~ limfj,^oo e^^^*f\ 

In Fig. I17l we finally compare the predicted typical solution time with numerical simulations. Close to the dynamical 
threshold, the numerical running times are much smaller, which can be explained already by the fact that the 
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Poissonian approximation undcr-cstimatcs ad- For larger a, the numerical data cross the analytical approximation, 
but both stay well below Schoning's bound. This is to be expected, since there is an exponential number of possible 
solutions, while Schoning assumes only the existence of a single one. Note that the solution times are exponentially 
smaller for 3-SAT than for random 3-XOR-SAT. 




t 

FIG. 15: 3-SAT at a = 3.5: Energy densities au(i) for various initial conditions au(0) and solution times tf. The system 
first equilibrates to a plateau being independent on the initial condition, and finally solves the SAT formula by a macroscopic 
fluctuation. 




FIG. 16: 3-SAT at a = 3.5: Action 5 as a function of the resolution time tf, for initial conditions au{0)/a = 0.1, 0.3, 0.5, 0.13, 

from bottom to top. 



VI. CONCLUSION AND OUTLOOK 



In this paper, we have presented an approximate analytical approach to describe the dynamical behavior of a class 
of stochastic local search algorithms applied to random /C-satisfiability and /C-XOR-satisfiability problems. We have 
seen that there are two distinct dynamical phases: 
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FIG. 17: 3-SAT: Solution time tsoi for Schoning's algorithm (only walk steps, random restarts after tfN = 3N steps) measured 
as the number of restarts, as a function of a. The analytical result is given by the full line. Numerical data for A'^ — 30, 50, 70 
(squares, dots, diamonds) this time cross the analytical prediction. Note that the solution times are smaller than for 3-XOR- 
SAT. 



• For clause-to- variable ratio a < ad (with ad being algorithm- and problem-dependent), the algorithm is able to 
solve almost all instances in linear time. In this regime, the dynamics was studied using a simple rate-equation 
approach which was able to capture the most important features of the average trajectory taken by the system 
under the action of the algorithm. 

• For a > ad, typical solution times were found to scale exponentially with the system size given by the number 
of variables N. This behavior could be understood analytically using a functional-integral approach to evaluate 
the probability of large deviations from the typical trajectory. We found the following behavior: The system 
equilibrates very fast to a non-zero plateau in the number of unsatisfied clauses. Then the system only fluctuates 
around this plateau. This goes on until an exponentially improbable macroscopic fluctuation towards one of the 
solutions appear, and the algorithm stops. The small probability of these fluctuations explains the exponentially 
large waiting times until a satisfying assignment is reached. 

For the exponential-time regime, only a Poissonian approximation was used. In principle it would be possible to go 
beyond this ansatz using the full distribution pt{s, u) of vertices with s satisfied and u unsatisfied clauses. Following 
the same scheme as in the Poissonian approach, we reach a system of first-order differential equations for all pt{s,u) 
and their conjugate parameters Kt{s,u). Being non-linear, it is far from obvious how to construct an analytical 
solution. But also the numerical integration of these equations is a hard problem: For the pt{s,u) there are initial 
and final conditions, whereas the Kt{s,u) have no boundary condition at all. The question if it is possible to follow 
this improved approach is still under investigation. 

Another possible extension of this work concerns the application of different heuristics like GSAT which was dis- 
cussed in the second section. The analytical approach can serve as a basis for evaluating the relative performance of 
different heuristics and, as a consequence of the insight gained, also as a step towards a systematic improvement of 
stochastic local search. 

A third point which remains open is the question in how far the solution space structure infiuences the performance of 
walk-SAT. As discussed in the beginning of the paper, random iiT-SAT and random ii'-XOR-SAT undergo a clustering 
transition deep inside the satisfiable phase. Below this transition, all solutions are collected in one huge cluster, 
above, an exponential number of such clusters exists. The clustering transition is also connected to a proliferation 
of metastable states which are expected to cause problems for any local algorithm. However, in our approach to the 
walk-SAT dynamics, we do not see any sign of a direct impact of this transition on the performance of the algorithms 
under consideration. The onset of exponential solution times is found to be inside the unclustered phase. It thus 
remains an open problem whether the clustering transition can be approached by using improved heuristic criteria. 
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